AFLEX
Section: User Commands (1)
Updated: 1 September 1990
Index
Return to Main Contents
NAME
aflex - fast lexical analyzer generator for Ada
SYNOPSIS
aflex
[
-bdfipstvILT -Sskeleton_file
] [
filename
]
DESCRIPTION
aflex
is a version of the Unix tool
lex
, but it is written in Ada and generates scanners in Ada.
It is upwardly compatible with the UCI tool alex, but is
much faster and generates smaller scanners.
OPTIONS
Command line options are given in a different format than in the
old UCI alex. Aflex options are as follows
- -t
-
Write the scanner output to the standard output rather than to a file.
The default name of the scanner file for base.l is base.a Note that this
option is not as useful with aflex because in addition to the scanner
file there are files for the externally visible dfa functions
(base_dfa.a) and the external IO functions (base_io.a)
- -b
-
Generate backtracking information to
aflex.backtrack.
This is a list of scanner states which require backtracking
and the input characters on which they do so. By adding rules one
can remove backtracking states. If all backtracking states
are eliminated and
-f
is used, the generated scanner will run faster (see the
-p
flag). Only users who wish to squeeze every last cycle out of their
scanners need worry about this option.
- -d
-
makes the generated scanner run in
debug
mode. Whenever a pattern is recognized the scanner will
write to
stderr
a line of the form:
--accepting rule #n
Rules are numbered sequentially with the first one being 1. Rule #0
is executed when the scanner backtracks; Rule #(n+1) (where
n
is the number of rules) indicates the default action; Rule #(n+2) indicates
that the input buffer is empty and needs to be refilled and then the scan
restarted. Rules beyond (n+2) are end-of-file actions.
- -f
-
has the same effect as lex's -f flag (do not compress the scanner
tables); the mnemonic changes from
fast compilation
to (take your pick)
full table
or
fast scanner.
The actual compilation takes
longer,
since aflex is I/O bound writing out the big table.
The compilation of the Ada file containing the scanner is also likely
to take a long time because of the large arrays generated.
- -i
-
instructs aflex to generate a
case-insensitive
scanner. The case of letters given in the aflex input patterns will
be ignored, and the rules will be matched regardless of case. The
matched text given in
yytext
will have the preserved case (i.e., it will not be folded).
- -p
-
generates a performance report to stderr. The report
consists of comments regarding features of the aflex input file
which will cause a loss of performance in the resulting scanner.
Note that the use of
the
^
operator
and the
-I
flag entail minor performance penalties.
- -s
-
causes the
default rule
(that unmatched scanner input is echoed to
stdout)
to be suppressed. If the scanner encounters input that does not
match any of its rules, it aborts with an error. This option is
useful for finding holes in a scanner's rule set.
- -v
-
has the same meaning as for lex (print to
stderr
a summary of statistics of the generated scanner). Many more statistics
are printed, though, and the summary spans several lines. Most
of the statistics are meaningless to the casual aflex user, but the
first line identifies the version of aflex, which is useful for figuring
out where you stand with respect to patches and new releases.
- -I
-
instructs aflex to generate an
interactive
scanner. Normally, scanners generated by aflex always look ahead one
character before deciding that a rule has been matched. At the cost of
some scanning overhead, aflex will generate a scanner which only looks ahead
when needed. Such scanners are called
interactive
because if you want to write a scanner for an interactive system such as a
command shell, you will probably want the user's input to be terminated
with a newline, and without
-I
the user will have to type a character in addition to the newline in order
to have the newline recognized. This leads to dreadful interactive
performance.
-
If all this seems to confusing, here's the general rule: if a human will
be typing in input to your scanner, use
-I,
otherwise don't; if you don't care about how fast your scanners run and
don't want to make any assumptions about the input to your scanner,
always use
-I.
-
Note,
-I
cannot be used in conjunction with
full
i.e., the
-f
flag.
- -L
-
instructs aflex to not generate
#line
directives (see below).
- -T
-
makes aflex run in
trace
mode. It will generate a lot of messages to stdout concerning
the form of the input and the resultant non-deterministic and deterministic
finite automatons. This option is mostly for use in maintaining aflex.
- -Sskeleton_file
-
overrides the default internal skeleton from which aflex constructs
its scanners. You'll probably never need this option unless you are doing
aflex maintenance or development.
INCOMPATIBILITIES WITH LEX
aflex
is fully compatible with
lex
with the following exceptions:
- -
-
Source file format:
The input specification file for
aflex
must use the following format.
definitions section
%%
rules section
%%
user defined section
##
user defined section
- -
-
lex's
%r
(Ratfor scanners) and
%t
(translation table) options
are not supported.
- -
-
The do-nothing
-n
flag is not supported.
- -
-
When definitions are expanded, aflex encloses them in parentheses.
With lex, the following
NAME [A-Z][A-Z0-9]*
%%
foo{NAME}? text_io.put_line( "Found it" );
%%
will not match the string "foo" because when the macro
is expanded the rule is equivalent to "foo[A-Z][A-Z0-9]*?"
and the precedence is such that the '?' is associated with
"[A-Z0-9]*". With aflex, the rule will be expanded to
"foo([A-z][A-Z0-9]*)?" and so the string "foo" will match.
Note that because of this, the
^, $, <s>,
and
/
operators cannot be used in a definition.
- -
-
Input can be controlled by redefining the
YY_INPUT
function.
YY_INPUT's calling sequence is "YY_INPUT(buf,result,max_size)". Its
action is to place up to max_size characters in the character buffer "buf"
and return in the integer variable "result" either the
number of characters read or the constant YY_NULL
to indicate EOF. The default YY_INPUT reads from
Standard_Input.
You also can add in things like counting keeping track of the
input line number this way; but don't expect your scanner to
go very fast.
- -
-
Yytext is a function returning a vstring.
- -
-
aflex reads only one input file, while lex's input is made
up of the concatenation of its input files.
- -
-
The following lex constructs are not supported
- REJECT
- %T -- character set tables
- %x -- changes to internal array sizes (see below)
ENHANCEMENTS
- -
-
Exclusive start-conditions
can be declared by using
%x
instead of
%s.
These start-conditions have the property that when they are active,
no other rules are active.
Thus a set of rules governed by the same exclusive start condition
describe a scanner which is independent of any of the other rules in
the aflex input. This feature makes it easy to specify "mini-scanners"
which scan portions of the input that are syntactically different
from the rest (e.g., comments).
End-of-file rules.
The special rule "<<EOF>>" indicates
actions which are to be taken when an end-of-file is
encountered and yywrap() returns non-zero (i.e., indicates
no further files to process). The action can either
text_io.set_input() to a new file to process, in which case the
action should finish with
YY_NEW_FILE
(this is a branch, so subsequent code in the action won't
be executed), or it should finish with a
return
statement. <<EOF>> rules may not be used with other
patterns; they may only be qualified with a list of start
conditions. If an unqualified <<EOF>> rule is given, it
applies only to the INITIAL start condition, and
not
to
%s
start conditions.
These rules are useful for catching things like unclosed comments.
An example:
%x quote
%%
...
<quote><<EOF>> {
error( "unterminated quote" );
}
<<EOF>> {
set_input( next_file );
YY_NEW_FILE;
}
- -
-
aflex dynamically resizes its internal tables, so directives like "%a 3000"
are not needed when specifying large scanners.
- -
-
aflex generates
--#line
comments mapping lines in the output to
their origin in the input file.
- -
-
All actions must be enclosed by curly braces.
- -
-
Comments may be put in the first section of the input by preceding
them with '#'.
- -
-
Ada style comments are supported instead of C style comments.
- -
-
All template files are internalized.
- -
-
The input source file must end with a ".l" extension.
FILES
- The names of the files containing the generated scanner, IO,
-
and DFA packages are based on the basename of the input file.
For example if the input file is called scan.l then the
scanner file is called scan.a, the DFA package is in scan_dfa.a, and
scan_io.a is the IO package file. All of these file names may be changed
by modifying the external_file_manager package (see the porting notes
for more information.)
- aflex.backtrack
-
backtracking information for
-b
SEE ALSO
lex(1)
M. E. Lesk and E. Schmidt,
LEX - Lexical Analyzer Generator.
Technical Report Computing Science Technical Report, 39, Bell Telephone
Laboratories, Murray Hill, NJ, 1975.
Military Standard Ada Programming Language
(ANSI/MIL-STD-1815A-1983),
American National Standards Institute, January 1983.
T. Nguyen and K. Forester,
Alex - An Ada Lexical Analysis Generator
Arcadia Document UCI-88-17,
University of California, Irvine, 1988
D. Taback and D. Tolani,
Ayacc User's Manual,
Arcadia Document UCI-85-10,
University of California, Irvine, 1986
AUTHOR
John Self. Based on the tool flex written and designed by
Vern Paxson. It reimplements the functionality of the tool alex
designed by Thieu Q. Nguyen.
Send requests for aflex information to alex-info@ics.uci.edu
Send bug reports for aflex to alex-bugs@ics.uci.edu
DIAGNOSTICS
aflex scanner jammed -
a scanner compiled with
-s
has encountered an input string which wasn't matched by
any of its rules.
old-style lex command ignored -
the aflex input contains a lex command (e.g., "%n 1000") which
is being ignored.
BUGS
Some trailing context
patterns cannot be properly matched and generate
warning messages ("Dangerous trailing context"). These are
patterns where the ending of the
first part of the rule matches the beginning of the second
part, such as "zx*/xy*", where the 'x*' matches the 'x' at
the beginning of the trailing context. (Lex doesn't get these
patterns right either.)
variable
trailing context (where both the leading and trailing parts do not have
a fixed length) entails a substantial performance loss.
For some trailing context rules, parts which are actually fixed-length are
not recognized as such, leading to the abovementioned performance loss.
In particular, parts using '|' or {n} are always considered variable-length.
Nulls are not allowed in aflex inputs or in the inputs to
scanners generated by aflex. Their presence generates fatal
errors.
Pushing back definitions enclosed in ()'s can result in nasty,
difficult-to-understand problems like:
{DIG} [0-9] -- a digit
In which the pushed-back text is "([0-9] -- a digit)".
Due to both buffering of input and read-ahead, you cannot intermix
calls to text_io routines, such as, for example,
text_io.get()
with aflex rules and expect it to work. Call
input()
instead.
There are still more features that could be implemented (especially REJECT)
Also the speed of the compressed scanners could be improved.
The utility needs more complete documentation.
Index
- NAME
-
- SYNOPSIS
-
- DESCRIPTION
-
- OPTIONS
-
- INCOMPATIBILITIES WITH LEX
-
- ENHANCEMENTS
-
- FILES
-
- SEE ALSO
-
- AUTHOR
-
- DIAGNOSTICS
-
- BUGS
-
This document was created by
man2html,
using the manual pages.
Time: 00:42:10 GMT, March 30, 2022